A Generative Approach for Multi-Document Summarization using the Noisy Channel Model

نویسندگان

  • Maria Lucía Castro Jorge
  • Thiago Alexandre Salgueiro Pardo
چکیده

Multi-document summarization is the automatic production of a unique summary from a collection of texts. This task has become very important, since it assists the information processing in days where the amount of information is growing considerably. In this paper, we propose a statistical generative approach for multi-document summarization. In particular, we formulate the multi-document summarization task using a Noisy-Channel model. This approach is novel for multi-document summarization and it explores the process of summarization through the analysis of factors, such as redundancy, complementarity and contradiction. In this work, we model these factors using the Cross-document Structure Theory.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Generative Approach for Multi-Document Summarization using Semantic-Discursive information

Multi-document summarization is the automatic production of a unique summary from a collection of texts. In this paper, we propose a statistical generative approach for multi-document summarization that combines simple information such as sentence position in the text and semantic-discursive information from CST (Cross-Document Structure Theory). In particular, we formulate the multi-document s...

متن کامل

Multi-candidate reduction: Sentence compression as a tool for document summarization tasks

This article examines the application of two single-document sentence compression techniques to the problem of multi-document summarization—a “parse-and-trim” approach and a statistical noisy-channel approach. We introduce the Multi-Candidate Reduction (MCR) framework for multi-document summarization, in which many compressed candidates are generated for each source sentence. These candidates a...

متن کامل

A Hybrid Hierarchical Model for Multi-Document Summarization

Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics u...

متن کامل

Extractive summarization using a latent variable model

Extractive multi-document summarization is the task of choosing sentences from a set of documents to compose a summary text in response to a user query. We propose a generative approach to explicitly identify summary and non-summary topic distributions in the sentences of a given set of documents (i.e., document cluster). Using these approximate summary topic probabilities as latent output vari...

متن کامل

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011